Online Retail Data Set

This is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail.The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers.

Source: https://archive.ics.uci.edu/ml/datasets/online+retail

The aim of this analysis is to use data science techniques to answer the following business questions and in so doing showcase my analytical and programming skills to solve business problems.

PART A

1. Business Metrics

    - Monthly Revenue
    - Monthly Growth Rate
    - Revenue by Country
    - Monthly Active Customers
    - Monthly Orders
    - Average Revenue per Order


2. Customer Metrics

    - Dividing customers into types : New & Existing Customers
    - Monthly Retention rate
    - Churn rate
    - Cohort-based retention rate



PART B

3.1 Market Attribution Modelling

    - Markov Chain model
    - First Touch attribution model
    - Last Touch attribution model

3. 2. Comparison of number of conversions attributed to market attribution models

PART C

4.1 Customer Segmentation

    - RFM Segmentation: Divide customers into segments

4.2 Recommendation

Import Packages

PART A

1. BUSINESS METRICS

This section aims at using the combination of programming and data analysis to answer business questions on revenue, growth rate, average revenue per order, etc.

1.1 Monthly Revenue

1.2 Monthly Growth Rate

We experienced lowest drop in revenue (28%) in April, 2011 and we need to identify what exactly happened.

We ask questions such as: was it due to less active customers or our customers did less orders? Maybe they just started to buy cheaper products?

1.3 Revenue by Country

Since majority of the revenue is from United Kingdom, i will focus on this country henceforth to make the analysis easy to follow.

Focus on sales from United Kingdom

1.4 Monthly Active Customers

In April, Monthly Active Customer number dropped to 817 from 923 (-11.5%).

1.5 Monthly Orders

As we expected, Number of Orders also declined in April (1802 to 1622, about 10% decrease.) We know that Number of Active Customer directly affected Number of Orders decrease. At the end, we should definitely check our Average Revenue per Order as well.

1.6 Monthly Average Revenue per Order

Even the monthly average revenue per order dropped for April (310 to 272).

2. CUSTOMER METRICS

In this section, we are concerned about types of customers such as First-time and Returning Customers: their monthly number, revenue and so on.

2.1 Number of customers by type

2.2 Revenue per month for new and existing customers

Number of First-time Customers by Month

Number of Returning Customers by Month

2.3. Customer Signup Date

2.4 Monthly Retention Rate

Retention rate should be monitored very closely because it indicates how sticky is your service and how well the product fits the market.

For making Monthly Retention Rate visualized, we need to calculate how many customers retained from previous month.

Monthly Retention Rate = Retained Customers From Prev. Month / Total Active Customers

Monthly Retention Rate significantly jumped from June to August and went back to previous levels afterwards.

2.5 Churn Rate

Churn rate refers to the percentange of customers from the preevious month that do not purchase in the next month.

To make it easier, i define the timeline for churn to be the previous month, so that

    churn rate + retention rate = 1

Hence,

    churn rate = 1 - retention rate

2.6 Cohort-Based Retention Rate

There is another way of measuring Retention Rate which allows us to see Retention Rate for each cohort. Cohorts are determined as first purchase year-month of the customers. We will be measuring what percentage of the customers retained after their first purchase in each month. This view will help us to see how recent and old cohorts differ regarding retention rate and if recent changes in customer experience affected new customer’s retention or not.

Interpretation

23% of customers in January 2011 (201101) repurchase in February
28% of them repurchase in March
25% reepurchase in April
.... and so on

PART B

MARKET ATTRIBUTION MODELLING

In a typical ‘from think to buy’ customer journey, a customer goes through multiple touch points before zeroing in on the final product to buy. This is even more prominent in the case of e-commerce sales. It is relatively easier to track which are the different touch points the customer has encountered before making the final purchase.

As marketing moves more and more towards the consumer driven side of things, identifying the right channels to target customers has become critical for companies. This helps companies optimise their marketing spend and target the right customers in the right places.

More often than not, companies usually invest in the last channel which customers encounter before making the final purchase. However, this may not always be the right approach. There are multiple channels preceding that channel which eventually drive the customer conversion. The underlying concept to study this behavior is known as multi-channel attribution modeling.

3. CHANNEL ATTRIBUTION

An attribution model is the rule, or set of rules, that determines how credit for sales and conversions is assigned to touchpoints in conversion paths.

In this section i will use and compare 3 channel attribution models namely: First touch, Last touch and Markov chain.

  1. First Touch Attribution: The First Interaction model assigns 100% credit to touchpoints that initiate conversion paths.

  2. Last Touch Attribution: The Last Interaction model 100% credit to the final touchpoints (i.e., clicks) that immediately precede sales or conversions

  3. Markov Chain: Markov Chain Model is a model that describes a sequence of events where the probability of each event depends only on the previous one. In the specific case of attribution modeling, the graph is the set of all customer paths, and the events are the visits to our website or our app, defined by the channel that brought the customer to it. As for the model's outcome, it is simply the conversion (1/0: there is/not a conversion at the end of the sequence).

Assumption

The dataset that does not contain information such as sales/marketing channels. However, for illustration purposes and to prove my knowledge of growth techniques such as attribution modelling, i created random sales channels and assume the conversion occurs due to one or some of these channels.

3.1 Markov Chains

The algorithm for Markov Chains can be summarized in 2 steps:

I’ll start by defining a list of all user journeys, the number of total conversion and the base level conversion rate.

Function that identifies all potential state transitions and outputs a dictionary containing these.

I’ll use this as an input when calculating transition probabilities

Function to calculate all transition probabilities

Converting Transition Probabilities to DataFrame

To do this we’ll make use of linear algebra and matrix manipulations, therefore let’s turn our above transition probabilities dictionary into a data frame (matrix).

3.1.1 Marketing Channels Interaction Probabilities

The transition probabilities heatmap below visualizes how our marketing channels interacts with each other. For instance, the probability of conversion (that is, purchasing a product) after engaging with TV is 7%, the probability of a customer engaging with Facebook after watching TV is 15% and so on.

Using historical context and the heat map above we not only gain insights into how each marketing channel is driving users towards our conversion event, but we also gain critical information around how our marketing channels are interacting with each other. Given today’s typical multi-touch conversion journeys this information can prove to be extremely valuable and allows us to optimize our multi-channel customer journeys for conversion.

3.1.2 Removal Effects Function

If we were to figure out what is the contribution of a channel in our customer’s journey from start to end conversion, we will use the principle of removal effect. Removal effect principle says that if we want to find the contribution of each channel in the customer journey, we can do so by removing each channel and see how many conversions are happening without that channel being in place.

Total number of conversion attributed to each channel by the Markov Chain algorithm:

3.2 First Touch Attribution

The revenue generated by the purchase is attributed to the first marketing channel the user engaged with, on the journey towards the purchase.

3.3 Last Touch Attribution

As the name suggests, Last Touch is the attribution approach where any revenue generated is attributed to the marketing channel that a user last engaged with. While this approach has its advantage in its simplicity, we run the risk of oversimplifying our attribution, as the last touch isn’t necessarily the marketing activity that generates the purchase.

3.4 Comparing First, Last and Markov Touch Attribution

PART C

4. CUSTOMER SEGMENTATION

But first off, why we do segmentation?

Because we can’t treat every customer the same way with the same content, same channel, same importance. They will find another option which understands them better.

Customers who use our platform have different needs and they have their own different profile. We should adapt our actions depending on that. We can do different segmentations according to what we are trying to achieve. If we want to increase retention rate, we can do a segmentation based on churn probability and take actions. But there are very common and useful segmentation methods as well.

In this section, i will implement one of them to the data: RFM which means Recency, Frequency and Montary value segmentation.

We can have customer segments such as these:

  1. Low Value: Customers who are less active than others, not very frequent buyer/visitor and generates very low - zero - maybe negative revenue.

  2. Mid Value: In the middle of everything. Often using our platform (but not as much as our High Values), fairly frequent and generates moderate revenue.

  3. High Value: The group we don’t want to lose. High Revenue, Frequency and low Inactivity.

We need to calculate Recency, Frequency and Monetary Value (we will call it Revenue from now on) and apply unsupervised machine learning to identify different groups (clusters) for each.

In summary, we would prefer customers with:

4.1 Recency

Function to rearrange clusters

4.2 Frequency

4.3 Monetary Value (Revenue)

4.4 Overall Segmentation

The scoring above clearly shows us that customers with score 5 are our best customers whereas 0 is the worst.

To keep things simple, better we name these scores:

0 to 1: Low Value

2 to 3: Mid Value

4 to 5: High Value overall score before segmentation

4.5 Recommendation

Customer Segment Task Action Plan
High-value Improve retention rate Advertise new items and reward them, they are the best-customers.
Mid-value Improve retention and frequency rate Introduce discounts, new pricing plans and customer loyalty programs.
Low-value Improve frequency New advertisement strategy + discount plans.